Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
There are a set of primordial features and functions expected of any modern information system: a substrate stably carrying data; the ability to repeatedly write, read, erase, reload, and compute on specific data from that substrate; and the overall ability to execute such functions in a seamless and programmable manner. For nascent molecular information technologies, proof of principle realization of this set of primordial capabilities would advance the vision for their continued development. Here, we present a DNA-based store and compute engine that captures these primordial capabilities. This system comprises multiple image files encoded into DNA and adsorbed onto ~50 um diameter, highly porous, hierarchically branched, colloidal substrate particles comprised of naturally abundant cellulose acetate. Their surface areas are over 200 cm2/mg with binding capacities of over 1012 DNA oligos/mg, 10 terabytes/mg, or 104 terabytes/cm3. This “dendricolloid” stably holds DNA files better than bare DNA with an extrapolated ability to be repeatedly lyophilized and rehydrated over 170 times compared to 60 times, respectively. Accelerated aging studies project half-lives of ~6000 and 2 million years at 4 ˚C and -18 ˚C, respectively. The data can also be erased and replaced, and non-destructive file access is achieved through transcribing from distinct synthetic promoters. The resultant RNA molecules can be directly read via nanopore sequencing and can also be enzymatically computed to solve simplified 3x3 chess and sudoku problems. Our study establishes a feasible route for utilizing the high information density and parallel computational advantages of nucleic acids.more » « less
-
Abstract MotivationDNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components. ResultsWe demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems. Availability and implementationThe source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).more » « less
-
Abstract The physical architectures of information storage systems often dictate how information is encoded, databases are organized, and files are accessed. Here we show that a simple architecture comprised of a T7 promoter and a single-stranded overhang domain (ss-dsDNA), can unlock dynamic DNA-based information storage with powerful capabilities and advantages. The overhang provides a physical address for accessing specific DNA strands as well as implementing a range of in-storage file operations. It increases theoretical storage densities and capacities by expanding the encodable sequence space and simplifies the computational burden in designing sets of orthogonal file addresses. Meanwhile, the T7 promoter enables repeatable information access by transcribing information from DNA without destroying it. Furthermore, saturation mutagenesis around the T7 promoter and systematic analyses of environmental conditions reveal design criteria that can be used to optimize information access. This simple but powerful ss-dsDNA architecture lays the foundation for information storage with versatile capabilities.more » « less
An official website of the United States government
